Goto

Collaborating Authors

 Northampton




Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets

arXiv.org Artificial Intelligence

Sparse embeddings of data form an attractive class due to their inherent interpretability: Every dimension is tied to a term in some vocabulary, making it easy to visually decipher the latent space. Sparsity, however, poses unique challenges for Approximate Nearest Neighbor Search (ANNS) which finds, from a collection of vectors, the k vectors closest to a query. To encourage research on this underexplored topic, sparse ANNS featured prominently in a BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on large benchmark datasets by throughput and accuracy. In this work, we introduce a set of novel data structures and algorithmic methods, a combination of which leads to an elegant, effective, and highly efficient solution to sparse ANNS. Our contributions range from a theoretically-grounded sketching algorithm for sparse vectors to reduce their effective dimensionality while preserving inner product-induced ranks; a geometric organization of the inverted index; and the blending of local and global information to improve the efficiency and efficacy of ANNS. Empirically, our final algorithm, dubbed Seismic, reaches sub-millisecond per-query latency with high accuracy on a large-scale benchmark dataset using a single CPU.


Solving 2-D Helmholtz equation in the rectangular, circular, and elliptical domains using neural networks

arXiv.org Artificial Intelligence

Physics-informed neural networks offered an alternate way to solve several differential equations that govern complicated physics. However, their success in predicting the acoustic field is limited by the vanishing-gradient problem that occurs when solving the Helmholtz equation. In this paper, a formulation is presented that addresses this difficulty. The problem of solving the two-dimensional Helmholtz equation with the prescribed boundary conditions is posed as an unconstrained optimization problem using trial solution method. According to this method, a trial neural network that satisfies the given boundary conditions prior to the training process is constructed using the technique of transfinite interpolation and the theory of R-functions. This ansatz is initially applied to the rectangular domain and later extended to the circular and elliptical domains. The acoustic field predicted from the proposed formulation is compared with that obtained from the two-dimensional finite element methods. Good agreement is observed in all three domains considered. Minor limitations associated with the proposed formulation and their remedies are also discussed.


DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning

arXiv.org Artificial Intelligence

Purpose: To evaluate the accuracy and reasoning ability of DeepSeek-R1 and three other recently released large language models (LLMs) in bilingual complex ophthalmology cases. Methods: A total of 130 multiple-choice questions (MCQs) related to diagnosis (n = 39) and management (n = 91) were collected from the Chinese ophthalmology senior professional title examination and categorized into six topics. These MCQs were translated into English using DeepSeek-R1. The responses of DeepSeek-R1, Gemini 2.0 Pro, OpenAI o1 and o3-mini were generated under default configurations between February 15 and February 20, 2025. Accuracy was calculated as the proportion of correctly answered questions, with omissions and extra answers considered incorrect. Reasoning ability was evaluated through analyzing reasoning logic and the causes of reasoning error. Results: DeepSeek-R1 demonstrated the highest overall accuracy, achieving 0.862 in Chinese MCQs and 0.808 in English MCQs. Gemini 2.0 Pro, OpenAI o1, and OpenAI o3-mini attained accuracies of 0.715, 0.685, and 0.692 in Chinese MCQs (all P<0.001 compared with DeepSeek-R1), and 0.746 (P=0.115), 0.723 (P=0.027), and 0.577 (P<0.001) in English MCQs, respectively. DeepSeek-R1 achieved the highest accuracy across five topics in both Chinese and English MCQs. It also excelled in management questions conducted in Chinese (all P<0.05). Reasoning ability analysis showed that the four LLMs shared similar reasoning logic. Ignoring key positive history, ignoring key positive signs, misinterpretation medical data, and too aggressive were the most common causes of reasoning errors. Conclusion: DeepSeek-R1 demonstrated superior performance in bilingual complex ophthalmology reasoning tasks than three other state-of-the-art LLMs. While its clinical applicability remains challenging, it shows promise for supporting diagnosis and clinical decision-making.


Bots against Bias: Critical Next Steps for Human-Robot Interaction

arXiv.org Artificial Intelligence

We humans are biased - and our robotic creations are biased, too. Bias is a natural phenomenon that drives our perceptions and behavior, including when it comes to socially expressive robots that have humanlike features. Recognizing that we embed bias, knowingly or not, within the design of such robots is crucial to studying its implications for people in modern societies. In this chapter, I consider the multifaceted question of bias in the context of humanoid, AI-enabled, and expressive social robots: Where does bias arise, what does it look like, and what can (or should) we do about it. I offer observations on human-robot interaction (HRI) along two parallel tracks: (1) robots designed in bias-conscious ways and (2) robots that may help us tackle bias in the human world. I outline a curated selection of cases for each track drawn from the latest HRI research and positioned against social, legal, and ethical factors. I also propose a set of critical next steps to tackle the challenges and opportunities on bias within HRI research and practice.


Neural Networks for Threshold Dynamics Reconstruction

arXiv.org Artificial Intelligence

We introduce two convolutional neural network (CNN) architectures, inspired by the Merriman-Bence-Osher (MBO) algorithm and by cellular automatons, to model and learn threshold dynamics for front evolution from video data. The first model, termed the (single-dynamics) MBO network, learns a specific kernel and threshold for each input video without adapting to new dynamics, while the second, a meta-learning MBO network, generalizes across diverse threshold dynamics by adapting its parameters per input. Both models are evaluated on synthetic and real-world videos (ice melting and fire front propagation), with performance metrics indicating effective reconstruction and extrapolation of evolving boundaries, even under noisy conditions. Empirical results highlight the robustness of both networks across varied synthetic and real-world dynamics.


Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments

arXiv.org Artificial Intelligence

In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence. Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effect estimation. We show that the knowledge of this true internal representation helps disentangle the treatment features of interest, such as specific sentiments and certain topics, from other possibly unknown confounding features. Unlike the existing methods, our proposed approach eliminates the need to learn causal representation from the data and hence produces more accurate and efficient estimates. We formally establish the conditions required for the nonparametric identification of the average treatment effect, propose an estimation strategy that avoids the violation of the overlap assumption, and derive the asymptotic properties of the proposed estimator through the application of double machine learning. Finally, using an instrumental variables approach, we extend the proposed methodology to the settings, in which the treatment feature is based on human perception rather than is assumed to be fixed given the treatment object. The proposed methodology is also applicable to text reuse where an LLM is used to regenerate the existing texts. We conduct simulation and empirical studies, using the generated text data from an open-source LLM, Llama 3, to illustrate the advantages of our estimator over the state-of-the-art causal representation learning algorithms.


Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains

arXiv.org Artificial Intelligence

Hate speech relies heavily on cultural influences, leading to varying individual interpretations. For that reason, we propose a Semantic Componential Analysis (SCA) framework for a cross-cultural and cross-domain analysis of hate speech definitions. We create the first dataset of definitions derived from five domains: online dictionaries, research papers, Wikipedia articles, legislation, and online platforms, which are later analyzed into semantic components. Our analysis reveals that the components differ from definition to definition, yet many domains borrow definitions from one another without taking into account the target culture. We conduct zero-shot model experiments using our proposed dataset, employing three popular open-sourced LLMs to understand the impact of different definitions on hate speech detection. Our findings indicate that LLMs are sensitive to definitions: responses for hate speech detection change according to the complexity of definitions used in the prompt.


Collaborative Comic Generation: Integrating Visual Narrative Theories with AI Models for Enhanced Creativity

arXiv.org Artificial Intelligence

This study presents a theory-inspired visual narrative generative system that integrates conceptual principles-comic authoring idioms-with generative and language models to enhance the comic creation process. Our system combines human creativity with AI models to support parts of the generative process, providing a collaborative platform for creating comic content. These comic-authoring idioms, derived from prior human-created image sequences, serve as guidelines for crafting and refining storytelling. The system translates these principles into system layers that facilitate comic creation through sequential decision-making, addressing narrative elements such as panel composition, story tension changes, and panel transitions. Key contributions include integrating machine learning models into the human-AI cooperative comic generation process, deploying abstract narrative theories into AI-driven comic creation, and a customizable tool for narrative-driven image sequences. This approach improves narrative elements in generated image sequences and engages human creativity in an AI-generative process of comics. We open-source the code at https://github.com/RimiChen/Collaborative_Comic_Generation.